 |
 |
XML for the absolute beginner
A guided tour from HTML to processing XML with Java

Printer-friendly
version | Mail this to a friend
Page 9 of 10
XML and Java Up to this point
I've been laying out general information about XML, without a lot of
reference to Java. Now that you understand XML, it's time to look at how
to process XML in Java. Java's a great language for XML, as you'll see. It
provides a portable data format that nicely complements Java's portable
code.
SAX appeal The easiest way to process an XML file
in Java is by using the Simple API for XML, or SAX. SAX is a
simple Java interface that many Java parsers can use. A SAX
parser is a class that implements the interface
org.xml.sax.Parser. This parser "walks" the tree of document
nodes in an XML file, calling the methods of user-defined handler classes.
To process an XML document, the programmer creates a class that
implements interface org.xml.sax.DocumentHandler. The
Parser object (that is, the object that implements
org.xml.sax.Parser) reads the XML from its input source,
calling the methods of the DocumentHandler when tags, input
strings, and so on are recognized at the input.
The methods of the DocumentHandler interface are as shown
in Listing 9.
public interface DocumentHandler {
public
abstract void setDocumentLocator (Locator
locator);
public abstract void startDocument () throws
SAXException;
public abstract void endDocument ()
throws SAXException;
public abstract void startElement
(String name, AttributeList atts) throws
SAXException;
public abstract void endElement (String
name) throws SAXException;
public abstract void
characters (char ch[], int start, int
length) throws
SAXException;
public abstract void ignorableWhitespace
(char ch[], int start, int length) throws
SAXException;
public abstract void
processingInstruction (String target, String
data) throws SAXException; }
Listing 9. interface
org.xml.sax.DocumentHandler
Package org.xml.sax includes a utility class called
HandlerBase, which implements the interface in Listing 9 (as
well as some other interfaces in the SAX package) with methods that do
nothing. Programmers can create a subclass of HandlerBase
that overrides only the methods they want to use.
For example, say we want a class that counts the elements in an XML
document. We could write a class as follows:
import org.xml.sax.*; public class ElementCounter extends
HandlerBase { protected int _iElements =
0;
public ElementCounter() {
}
// Each time the SAX parser encounters an
element, it // will call this
method public void startElement (String name,
AttributeList
atts) throws
SAXException { _iElements++;
}
public void
endDocument()
{ System.out.println("Document
contains " + _iElements + "
elements."); } };
Listing 10. A class that counts the elements in
an XML document
To create a Java program that counts elements in an XML file, you'd
simply create a SAX parser (how you do that depends on your particular
parser package), then create an instance of your
ElementCounter class. You then call the parser's
setDocumentHandler method with the new
ElementCounter as an argument. The parser keeps a reference
to the DocumentHandler you passed to it. When you call the
parser's parse() method, the parser reads its input source.
Each time it encounters an element (that is, a tag) in the XML file, it
calls the startElement() method of your
ElementCounter object, passing the name of the tag and a list
of attributes the tag may have had.
Experimenting with SAX An example package,
com.javaworld.JavaBeans.XMLApr99, can be downloaded for free
(see Resources).
The sample main() program lets you specify (in this order):
- An XML file to parse
- The fully specified class name of the parser (optional)
- The fully specified class name of a document handler
The package includes two document handlers: the
ElementCounter from Listing 10, and a handler called
SimplePrinter, which (naturally) simply prints the XML with
an easy-to-read indentation. You can try writing your own document handler
and passing it to the main method (called
com.javaworld.JavaBeans.XMLApr99.ParseDemo.main()).
You'll need the JAR file called "XMLApr99.jar," and you'll need to
download the JAR file for IBM's excellent "XML for Java" package (version
2). Place both JAR files in your CLASSPATH, and type
java com.javaworld.JavaBeans.XMLApr99.ParseDemo
for instructions. The XML for Java package includes excellent
documentation, a programmer's guide, and several example programs to get
you started.
The source code is also available in zip and tar.gz formats. As an
exercise, try downloading one of the other vendors' XML parsers from the
Resources
section, and then overriding the method
ParseDemo.createParser() in the sample code to create a
parser from the new package.
Next
page > Page 1 XML
for the absolute beginner Page 2 HTML:
All form and no substance Page 3 An
XML conceptual example Page 4 Make
up a markup Page 5 So,
what good is made-up markup? Page 6 Cascading
Style Sheets: not just for HTML anymore Page 7 XSL:
I like your style Page 8 Modeling
information structure in XML Page 9 XML and Java Page 10 Become
a tree surgeon!
Printer-friendly
version | Mail this to a friend
Resources There are so
many XML resources on the Web, I've had to categorize. The first section
here is the most useful, since the documents are either high-level
summaries or excellent link sites. Apologies to anyone who was omitted.
XML and Java: General XML resources
- "XML, Java and the Future of the Web," Jon Bosak. The paper that
started it all, at least from a Java programmer's point of view.
Definitely worth a read, even if it's a bit dated. Jon is commonly
considered to be the father of XML. Funny how all of these technologies
seem to have paternity:
http://metalab.unc.edu/pub/sun-info/standards/xml/why/xmlapps.html
- "Media-Independent Publishing: Four Myths about XML" Jon Bosak:
http://metalab.unc.edu/pub/sun-info/standards/xml/why/4myths.htm
- Robin Cover's XML-SGML site is, according to my SGML buddies, the
bible of XML resources:
http://www.oasis-open.org/cover/
- The W3C's XML resource page lets you cheer from the sidelines as XML
technology proposals develop into recommendations, or join in the fray
on their active mailing lists:
http://www.w3.org/XML/
- OASIS, the Web site of the Organization for the Advancement of
Structured Information Standards, offers general news and information
about XML:
http://www.oasis-open.org/
- The Graphics Communications Association, host of the XTech '99
conference (March 11 to 13, 1999, San Jose, CA) and the upcoming XML
Europe '99 conference in Granada, Spain, (April 26 to 30, 1999) has a
Web site packed with XML information:
http://www.gca.org/
- XML.com is great for watching trends and digging up XML news:
http://www.xml.com/
- Textuality hosts Tim Bray's site. Check it out for a look at the
"big picture" of how XML fits into the structured document universe --
and for a look at Lark, Tim's nonvalidating XML processor:
http://www.textuality.com/
- The XML FAQ:
http://www.ucc.ie/xml/
- IBM's XML Website is an outstanding supplement to alphaWorks:
http://www.software.ibm.com/xml/index.html
XML and Java
- "XML and Java: The Perfect Pair" by Ken Sall (Internet.com, November
1998) provides information about XML, Java, and why these two are a
match made in heaven:
http://wdvl.com/Authoring/Languages/XML/Java/index.html
Tutorials and training
- Generally Markup, Richard Lander's Web site may be of interest to
you if you haven't yet read enough about markup languages:
http://pdbeam.uwaterloo.ca/~rlander/
- The Mulberry Technologies Web site is a good resource for commercial
training in XML, as well as general XML and SGML consulting by seasoned
SGML experts:
http://www.mulberrytech.com/
- The Web Developer's Virtual Library Series on XML offers good
summaries of various XML technologies, as well as annotated indices of
XML software:
http://wdvl.com/Software/XML
- Microsoft's Site Builder Network provides a series of articles
called "Extreme XML," one of which appears in the following link. While
some of it focuses on Microsoft-only, Windows-only technology, there's
still some great stuff here:
http://www.microsoft.com/sitebuilder/magazine/xml.asp
- Webmonkey has a good series of articles introducing readers to XML.
The index is at:
http://www.hotwired.com/webmonkey/xml/?tw=xml
- "What the ?xml!" by L.C. Rees offers an interesting take on XML and
why it's necessary -- nicely written and entertaining to boot:
http://www.geocities.com/SiliconValley/Peaks/5957/wxml.html
- "The XML Revolution" by Dan Connolly is a quick backgrounder on XML
(Nature):
http://helix.nature.com/webmatters/xml.html
Cascading Style Sheets
- W3C's CSS page will get your started learning about CSS:
http://www.w3.org/Style/CSS/
- "Cascading Style Sheets Designing for the Web" by Hakom Wium Lie and
Bert Bos (Addison-Wesley, 1997) Sample chapters from the book appear at:
http://www.awl.com/cseng/titles/0-201-41998-X/liebos/
Extensible Style Language (XSL)
- The W3C's XSL page:
http://www.w3.org/Style/XSL/
- Read (and comment on) the W3C's XSL Working Draft (currently dated
December 16, 1998):
http://www.w3.org/TR/WD-xsl
- "The Extensible Style Language: Styling XML Documents"
(WebTechniques Magazine) XSL tutorial information and examples:
http://www.webtechniques.com/features/1999/01/walsh/walsh.shtml
- Microsoft's XML and XSL tutorial site is especially interesting
because of the recent release of client-side XSL in Internet Explorer
5.0. Extensive and excellent:
http://www.microsoft.com/xml
- If you're still using IE 4.0, you can still experiment with XML,
using Microsoft's internal DOM:
http://www.microsoft.com/xml/articles/xmlmodel.asp
- If you want to experiment with XSL, try downloading IBM's LotusXSL.
It's all Java, and for the time being, it's free:
http://www.alphaworks.ibm.com/tech/LotusXSL
- Or, you can try James Clark's XT XSL engine, downloadable from:
http://www.jclark.com/xml/xt.html
Upcoming XSL contest
Though the details aren't yet worked out, Sun Microsystems will soon
announce a call for proposals for a $30,000 grant to develop a
client-side processor for full XSL implementation in Mozilla.
It will also announce, in conjunction with Adobe, a contest (first prize
$40,000, second prize $20,000) to develop a pure-Java, server-side
processor of the entire XSL language, to format XML to PDF (Adobe's
document format). Keep watching the Java Developer Connection (requires
free registration), and Mozilla sites for the eventual announcements.
- "XTech '99: Java and the XML wave" by Mark Johnson
(JavaWorld, April 1999) offers the most current information on
the contest:
http://www.javaworld.com/javaworld/jw-04-1999/jw-04-xtech.html
Simple API for XML (SAX)
- The definitive description of SAX is available online. You can also
download free SAX software here:
http://www.megginson.com/SAX/index.html
Document Object Model (DOM)
- The W3C information page for the Document Object Model appears on
the W3C site:
http://www.w3c.org/DOM/
- Among other things, you'll find the W3C Recommendation for DOM Level
1:
http://www.w3.org/TR/REC-DOM-Level-1/
- The Java bindings for DOM, for both XML and HTML, are in this
Recommendation appendix:
http://www.w3.org/TR/REC-DOM-Level-1/java-language-binding.html
- A great DOM tutorial by William Robert Stanek appears on PC
Magazine Online in "Object-Based Web Design." This tutorial
includes a discussion of using DOM with IDL, CORBA's Interface
Definition Language:
http://www8.zdnet.com/pcmag/pctech/content/17/13/tf1713.001.html
Dynamic HTML
- The Dynamic HTML Resource page contains several links to DHTML
articles:
http://www.hotwired.com/webmonkey/dynamic_html/?tw=dynamic_html
Software
- Epicentric, Inc.:
http://www.epicentric.com/
- More XML (and other Java) technology than you can shake a stick at
is available at IBM's alphaWorks:
http://alphaworks.ibm.com/
- Version 2 of IBM's excellent XML parser package, xml4j, is available
for download. This package includes several parsers, both validating and
nonvalidating:
http://www.alphaworks.ibm.com/tech/xml4j
- See also IBM's exciting Bean Markup Language project, which uses XML
to represent and manipulate JavaBeans:
http://www.alphaworks.ibm.com/tech/bml
- Another free Java XML parser was written by the indefatiguable James
Clark, download at:
http://www.jclark.com/xml/xp/index.html
- XEENA is IBM alphaWorks's DTD-guided XML editor. You want it, you
need it, you gotta have it:
http://www.alphaworks.ibm.com/tech/xeena
- Mozilla.org is the open source community's effort to extend the
Netscape source code. Find out about it at:
http://www.mozilla.org/
- Information about XML and CSS in Mozilla appears at:
http://www.mozilla.org/rdf/doc/xml.html
- You can read about Sun's XML and Java initiatives at:
http://www.sun.com/990310/java_xml.jhtml
- In addition, Java Project X includes source code downloadable from:
http://developer.java.sun.com/developer/earlyAccess/xml/index.html
- ArborText has a suite of sophisticated tools for editing SGML, XML,
and XSL:
http://www.arbortext.com/Products/products.html
- Oracle8i from Oracle corporation uses XML inside the Oracle core:
http://www.oracle.com/xml/
- Download Oracle's free XML for Java parser:
http://technet.oracle.com/direct/3xml.htm
- Microsoft's Internet Explorer 5.0, released this month, implements
part of the XSL spec. You can find it on Microsoft's Web site -- and
also just about anywhere else:
http://www.microsoft.com/windows/ie/default.htm
- You can also download a beta release of Microsoft's XML Notepad
editor (limited to running only on Microsoft Windows):
http://www.microsoft.com/xml/notepad/download.asp
- Vervet Logic of Bloomington, IN, has announced XML <PRO>, a
commercial XML editor:
http://www.vervet.com/
- Majix, to transform XML to HTML via XSL, is available at:
http://www.tetrasix.com/
- If your French is rusty, you might want to try the English-language
site at:
http://www.tetrasix.com/english/default.htm
History
- Read about the history of HTML here. It's part of an online book, so
there's no telling for how long it will be available:
http://ei.cs.vt.edu/~wwwbtb/hardcopy/book/chap4/origins.html The
two chapters listed below (of the book "HTML Unleashed" by Rick Darnell,
et al., also cover some of the technical background of these languages.
- SGML history
http://www.webreference.com/dlab/books/html/3-2.html
- XML history (such as it is):
http://www.webreference.com/dlab/books/html/38-0.html
- Nothing to do on Friday night? Why not read up on the history of
SGML? Charles Goldfarb, considered by many to be the "father of SGML,"
reminisces publicly at:
http://www.sgmlsource.com/Goldfarb/history/index.htm
- Useful XML and SGML information appears at Goldfarb's Web site,
including a comprehensive XML book list:
http://www.sgmlsource.com/
Miscellaneous links
- Uche Ogbuji has written an interesting article in
LinuxWorld about using XML on Linux in the Enterprise. It's at:
http://www.linuxworld.com/linuxworld/lw-1999-03/lw-03-xml.html
- Bluestone Software has recently made a splash with pure-Java XML
application servers, and a freely downloadable Swing package called
XwingML:
http://www.bluestone.com/
- Everyone (except Microsoft) is pretty freaked out about the US
Patent Office awarding Microsoft a patent for certain kinds of
functionality in style sheets. What happens with this patent, and its
impact on developing technology, remains to be seen. Judge for yourself
by reading the patent at:
http://www.patents.ibm.com/patlist?icnt=US&patent_number=5860073
- The title of the sample recipe is actually the title of a very funny
song by William Bolcom. Similar recipes may be found at:
http://www.b4uby.com/granny/gsoup.htm
- The song appears on a compact disc (with other odd songs) available
from the Public Radio Music Source at:
http://75music.org/best/docs/keepers.htm
|
 |